-
Notifications
You must be signed in to change notification settings - Fork 928
Deflakes Primary COB growth with inactive replica #2715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deflakes Primary COB growth with inactive replica #2715
Conversation
Signed-off-by: Sarthak Aggarwal <[email protected]>
f9f94d9 to
db6e772
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #2715 +/- ##
============================================
+ Coverage 72.40% 72.62% +0.21%
============================================
Files 128 128
Lines 71273 71273
============================================
+ Hits 51606 51759 +153
+ Misses 19667 19514 -153 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you give some clues? is that how (or where) did you determine the timeout and why increasing rdb-key-save-delay helped? We can add more info in the top comment and then merge it into. The analysis info is also very useful in these timing issues
|
@enjoy-binbin thank you for taking a look, I shared my thought process in the PR description, please let me know if it doesn't makes sense! |
Resolves valkey-io#2696 The primary issue was that with sanitizer mode, the test needed more time for primary’s replication buffers grow beyond `2 × backlog_size`. Increasing the threshold of `repl-timeout` to 30s, ensures that the inactive replica is not disconnected while the full sync is proceeding. `rdb-key-save-delay` controls or throttles the data written to the client output buffer, and in this case, we are deterministically able to perform the fullsync within 10s (10000 keys * 0.001s). Increasing the `wait_for_condition` gives it enough retries to verify that `mem_total_replication_buffers` reaches the required `2 × backlog_size`. Signed-off-by: Sarthak Aggarwal <[email protected]>
Resolves #2696
The primary issue was that with sanitizer mode, the test needed more time for primary’s replication buffers grow beyond
2 × backlog_size. Increasing the threshold ofrepl-timeoutto 30s, ensures that the inactive replica is not disconnected while the full sync is proceeding.rdb-key-save-delaycontrols or throttles the data written to the client output buffer, and in this case, we are deterministically able to perform the fullsync within 10s (10000 keys * 0.001s).Increasing the
wait_for_conditiongives it enough retries to verify thatmem_total_replication_buffersreaches the required2 × backlog_size.The test is passing for past 7 consecutive iterations for
test-sanitizer-addressin my daily runs. I amended log to show the current backlog if it doesn't reach 2x.